IWCROCOPY  RESOLUTION  TEST  CHART 

NATIONAL  BUREAU  Of  STANDARDS  1963  A 


SS838oVw 


prrmt\- 


(3 


T - ) 

k  SEQUENTIAL^CONFIDENCE  JNTERVAL 
FOR  THE  ODDS  RATIO. 


BY 


/O 


D.^SIEGMUND 


G^jechnical  ^EP«I^O.  7 

(T^J  J-MIIIW  l|  iVjlIj 


;  Tf{-rL  ' 

>  rpR'AJil  I 


J-? 


PREPARED  UNDER  CONTRACT 


/£  J  n^944-77-c7^3^1  77-14 

Tot! 


FOR  THE  OFFICE  DF  NAVAL  RESEARCH 


DTIC 

ELECTE 

APR  9  1980 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


DISTRIBUTION  STATEMENT  A 
Approvd  lot  public  nltcMt 
Diatzibution  Unlimited 


3Ja3  y 

8  0  4  9  701 


A  SEQUENTIAL  CONFIDENCE  INTERVAL 
FOR  THE  ODDS  RATIO 


by 


D.  Siegmund 


Technical  Report  No.  7 
February  2,  1980 


Prepared  under  Contract 
4o0014-77-C-0306  (NR-042-373) 
for  the  Office  of  Naval  Research 


D.  Siegmund,  Project  Director 


Reproduction  in  whole  or  in  part  is  permitted  for 
any  purpose  of  the  United  States  Government 


^DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 

Also  issued  as  Technical  Report  No.  141  under  National  Science 
foundation  urdnc  n^S77-TS9T4  -  Department  of  Statistics, 
Stanford  University.*' 


ABSTRACT 


A  sequential  fixed  width  confidence  Interval  Is  proposed  for 
the  log  odds  ratio  of  a  2x2  table.  It  Is  shown  that  the  proposed 
interval  has  asymptotically  the  correct  coverage  probability  and  is 
asymptotically  efficient  uniformly  in  the  unknown  parameters. 
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A  SEQUENTIAL  CONFIDENCE  INTERVAL  FOR  THE  ODDS  RATIO 


1.  Introduction 

For  i  -  1,2  let  s^  and  f  ^  **  -  s^  be  respectively  the 

numbers  of  successes  and  failures  in  n^  independent  Bernoulli  trials 
with  constant  success  probability  on  each  trial.  A  simple  large 
sample  approximate  confidence  interval  for  the  log  odds  ratio, 

log  (P1q2/P2ql)»  is 


(1)  log(slr,  f,„  /s0„  f,„  )  ±  zn[n,/a,„  f,_  +  ti,/s^  V  , 


’ln1i2n2'“2n2iln1 


“  L  /  a.  j.  _ 

a  1  ln^  ln^ 


2  2n2  2n2 


where  /  (2ir)  *  exp(-x  /2)dx  =  a/2  (Cox,  1970,  p..  35).  The  confi- 

za 

dence  coefficient,  1  -  a,  is  asymptotically  correct  for  fixed  p^,p2  as 
min(n1,n2)  -*■  03 . 


These  intervals  have  two  defects  when  and  p2  may  be  near  0 
or  1.  On  the  one  hand  the  rate  of  approach  to  normality  can  be  very 
slow,  so  that  use  of  asymptotic  theory  is  questionable.  More 
importantly,  however,  even  with  exact  calculations,  no  fixed  sample 
size  design  will  permit  one  to  estimate  the  log  odds  ratio  by  an 
interval  of  preassigned  width  in  these  boundary  cases. 

For  one  binomial  population  with  success  probability  p, 
Robbins  and  Siegmund  (1974)  proposed  a  sequential  scheme  for  obtain¬ 
ing  approximately  a  confidence  interval  of  preassigned  width  for 
log(p/q).  However,  they  do  not  consider  the  question  of  the  uni¬ 
formity  of  their  procedure  for  p  near  0  or  1,  when  a  sequential  pro¬ 
cedure  would  presumably  be  of  greatest  value. 


The  purpose  of  this  paper  is  to  consider  the  two  population 
analogue  of  the  procedure  of  Robbins  and  Siegmund.  The  procedure 
will  be  seen  to  attain  asymptotically  the  required  coverage  prob¬ 
ability  and  to  be  asymptotically  efficient  uniformly  in  0  <  p^,  p^  <  1. 

In  Section  2  the  one-population  case  is  reviewed,  and  the 
results  of  Robbins  and  Siegmund  are  appropriately  strengthened  to 
provide  the  tools  for  the  two-population  problem.  It  is  also  shown 
that  Robbins  and  Siegmund' s  uncritical  acceptance  of  Haldane's  (1955) 
modification  of  the  empirical  log  odds  ratio  is  inappropriate  in  the 
sequential  case. 

Section  3  is  concerned  with  the  case  of  two  populations. 
Remarks  about  further  extensions  are  collected  in  Section  4. 
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2.  One  Population 

Let  xi»x2»***  be  Independent  with  P{x^  - 1}  -  p, 

Ptx.  "0}-q-l-p  (j- 1,2,...  ).  Let  8  -x,  +  ...  +  x  and 
j  n  i  n 

f n  "  n  -  8n>  For  large  n  log(sQ/fn)  is  approximately  normally  dis¬ 
tributed  with  mean  log  (p/q)  and  variance  l/(npq)  .  Hence  to  find  a 
confidence  interval  for  log(p/q)  of  preassigned  width,  or  equivalently 
in  large  samples  to  estimate  log(p/q)  by  an  estimator  with  preassigned 
variance  1/c,  Robbins  and  Slegmund  (1974)  define 

(2)  T  ■  lnf{n  :  s  f  >nc)  . 

n  n 

They  propose  estimating  log (p/q)  by 

(3)  log[(sT  +i)/(fT  +|)]  , 

which  they  show  is  asymptotically  normally  distributed  with  mean 
log(p/q)  and  variance  l/c  as  c  +  »,  The  modification  of  the  empiri¬ 
cal  log  odd 8  by  adding  1/2  to  numerator  and  denominator  was 
originally  suggested  by  Haldane  (1955)  as  a  bias  reducing  device  in 
the  fixed  sample  case.  Robbins  and  Slegmund  also  show  that 
ET  ~  c/(pq)  as  c  +  »,  This  may  be  interpreted  as  showing  that  their 
procedure  is  asymptotically  efficient  in  the  sense  of  requiring 
asymptotically  about  the  same  number  of  observations  as  a  fixed  sam¬ 
ple  procedure  chosen  to  be  appropriate  for  a  value  pQ  which  happens 
to  be  the  actual  value  of  p. 

In  this  section  it  is  shown  that  the  asymptotic  normality  of 
(3)  holds  uniformly  over  0  <  p  <  1.  This  is  in  marked  contrast  to 
the  fixed  sample  case,  as  was  noted  in  the  Introduction.  It  will 
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also  be  shown  that  the  analogue  of  Haldane's  bias  reducing  device  in 
this  sequential  context  is  to  subtract  from  numerator  and  denomin¬ 
ator  of  the  empirical  odds  ratio.  However,  for  simplicity  and  because 
the  appropriate  modification  for  the  two  sample  case  is  unknown,  in 
most  of  what  follows  only  the  unmodified  empirical  odds  ratio  is 
considered. 

The  main  result  of  this  section  is  Theorem  1.  Lemma  1,  which 
was  obtained  by  Robbins  and  Siegmund  (1974) ,  is  of  interest  in  its 
own  right.  It  says  that  the  asymptotic  efficiency  of  (2)  is  uniform 
in  0  <  p  <  1.  Repeated  use  will  be  made  of  the  algebraic  identity 

2 

(4)  sn£n^n  "  P) (sn~  nP)  +  nP9  “  (8n-  nP>  * 

Theorem  1.  For  the  stopping  rule  T  defined  in  (2) ,  uniformly  in 

0  <  p  <  1 

lim  P(c^f log(s—/f«) —  log(p/q) ]<  x)  -  $(x)  , 

c*» 

where 

*<x)  -  (2tt)  **  /*  exp(-u2/2)du  . 

The  proof  utilizes  the  following  lemmas.  For  the  simple 
proof  of  Lenna  1  based  on  (4),  see  Robbins  and  Siegmund  (1974). 

Lenaa  1.  c  <  pq  ET  <  (c+  l)/[l-  (4c)“*]. 
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Lemma  2.  There  exists  a  Cq  such  that  for  all  c  >  and  all  0  <  p  <  1 

(pq)2  E(T-  c/pq)2  <  4c  . 

Lemma  3.  For  each  0  <  e  <  1  and  c  >  c^,  where  c^  is  defined  as  in 
Lemma  2, 

P{|aT-  pT|  >  £  pq  T)  <  tc/e2  c  , 
where  K  does  not  depend  on  e  or  c. 

Proof  of  Lemma  2.  Squaring  (4)  gives 

(sf/n-  c)2  -  (q-  p)2(s  -  np)2  +  (pq)2(n-  c/pq)2  +  (a  -  np)4/n2 
n  n  n  n 

3  2 

+  2{(q-  P>(»n-  np)  (pqn-  c)  -  (q-  pXs^-  np)  /n-  (npq-  c)(sn~  np)  /«}  • 
By  the  Schwarz  inequality  and  Wald's  second  moment  identity 
|e{(8t-  pT)  (T-  c/pq))  |  <  (pqET  E(T-  c/pq)2}**  . 

2 

Hence,  since  c)  <1,  Wald's  second  moment  identity  yields 

1>  (q-  p)2pq  E(T)+  (pq)2  E(T-  c/pq)2-  2pq|q-  p|{pq  E(T)E(T-  c/pq)2}** 

-  2 1 q  —  p | pq  ET  -  2(pq)  ET  , 
or 

(pq)2  E(T-  c/pq)2-  2pq|q-  p|{pqET  E(T-  c/pq)2)1**  pq(q-  p)2  ET 
<  1  +  2pq  ET  , 
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Taking  square  roots  in  this  expression,  then  rearranging  terns  and 
squaring  yields 

(pq)2  E(T-  e/pq)2  <  { (pq  ET)**  +  (1  +2pq  ET)**}2 

<  1  +  3pq  ET  <  1  +  3(c+  1)/(1-  l/4c)  , 

where  the  last  inequality  follows  from  Lemma  1.  This  completes  the 
proof . 

Proof  of  Lemma  3.  Let  0  <  5  <  1  and  Hq  ■  c/pq.  By  Leona  2 

P(|T-  nQ|>  6c/pq}  <  (fie)'2  (pq)2  E(T  -  nQ)2  <  4/62c  . 

Hence,  by  Wald's  lemma  for  the  second  moment  and  Lemma  1 
P{|sT-  PT |  >  epqT)  <  4/6 2c  +  P(|sT-pT|>  epqT,  |T  -  n()|  <  6c/pq} 

<  4/62c  +  P(|sT-  pT|  >  e(l-  6)c}  <  4/6 2c  +  E(sT-  PT)2/e2(l-  6)2  c2 

<  4/62c  +  2/[e2(l- 5)2  c]  . 

Proof  of  Theorem  1.  From  the  mean  value  theorem  follows 
c’S[log(sT/fT)-log(p/q)3  -  c,S(sT-pT)/pqT+c,S[(sT-pT)/pqT]  -  ^  y  -  1 

-  n^(sT-PT)/(pq)Js  T  +  nj*  [(sT-  pT)/(pq)5a  T]  -  -  1  , 

where  |hj-p|  <_  |t  2  s^,-p|,  and  as  before  n^  ■  c/pq.  Hence  it  suf¬ 
fices  to  show  that  uniformly  in  0  <  p  <  1 
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11®  Pfnl^S-  -  pT)/  (pq)**  T  <  x)  -  *(x) 
c+» 

and 

pq/nTd-nr)  — -  '■»  i  . 

The  second  statement  follows  easily  fro®  Lena  3,  and  the  first  stay 
be  obtained  by  minor  modifications  In  the  standard  proof  of 
Anscombe's  theorem  (e.g.,  Renyi,  1966,  p.  390). 

An  asymptotically  more  precise  approximation  to  E(T)  than 
that  provided  by  Lemma  1,  although  one  which  is  decidedly  not  uniform 
in  p ,  is 

(5)  pq  ET  -  c  +  -|(p  -q)2  +  ^  pq  +  o(l)  (c  -►  “)  , 

2 

which  is  valid  for  all  p  for  which  (p/q)  is  irrational.  This  result 

follows  easily  from  (4)  and  Theorem  2  of  Lai  and  Slegmund  (1979) . 

As  an  estimator  of  log(p/q),  Haldane  (1955)  considered 

log((s  +  a) /(f  -fa)}  and  showed  by  a  Taylor  series  expansion  that  the 
n  n 

choice  of  a  minimizing  the  asymptotic  bias  of  this  estimator  is 
a  "  The  following  heuristic  calculation  shows  that  a  ■  -  is 
appropriate  in  the  present  context.  The  machinery  for  justifying 
this  calculation  may  be  found  in  Poliak  and  Slegmund  (1975) .  It 
should  be  noted  that  this  result  is  appropriate  for  the  stopping 
rule  T  defined  by  (2) .  It  does  not  carry  over  to  the  two-population 
case  discussed  in  Section  3. 
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A  two  term  Taylor  series  expansion  gives 


logUsT +a)/(fT +a)>  -  log(p/q)  -  (sT-pT+a)/pT  -  (ft~qT+a)/qT 

-(sT-pT)2/2(pT)2  +  (fT-qT)2/2(qT)2  +  OpCc'1)  . 

Since  T  ?  c/pq  and  hence  E{ (sT  -  pT)2/T2} 

-  (pqc-1)2  E{(sT-pT)2}  *  (pqc-1)2  pq  ET  -  (pq)2/c  , 

one  obtains 


(6) 


E[log{(sT  +  a)/(£T  +  a)}]  -  log(p/q) 

~  E(  (st  -  pT)  /pqT)  +  c  1(q-p)(a-j)  . 


It  is  shown  below  that 

E{(sT-pT)/T}  ~  c-1  pq(q-p)  (c  -*■  ®)  , 

which  shows  that  the  right  hand  side  of  (6)  is  ~  c  ^(q  -  p) (a +y) , 
leading  to  the  optimal  choice  a  ■  -  -j. 

Let  -  8TfT/T  -  c.  By  (A)  and  Taylor  expansions,  one 

obtains 


(sT-pT)/T  -  (q -p)”’1Cc  +5T)(pqc~1  -  (pqc"1)  2  (T  -  c/pq) 

+  (pqc-1)  3(T- c/pq)2  +...)-  (q-p)’1  pq  +  (q- P)”1  CeT-pT)2/T2. 
It  is  easy  to  see  from  (A)  that  c  +  E?T  ■  pq  ET  -  pq  +  o(l) ;  and 
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Robbins  and  Slegmund  (1974)  have  obtained  E(T  -  c/pq)  » 

2  2 

(q-p)  c/(pq)  +  0(1).  Hence  by  the  asymptotic  independence  of  ^ 
-k 

and  c  (T-c/pq)  (Lai  and  Siegmund,  1977), 

E{(sT-pT)/T>  -  (q-p)_1(c+E5T){pqc  1  -  (pqc_1)2[E?T/pq+  l) 

+  (pqc_1)3(q-p)2  c/(pq)2}  -  (q-p)'1  pq  +  (pq)2/(q-p)c  +  o(c_1) 

~  c-1  pq(q-p)  , 


as  claimed. 


3.  Two  Populations 

Consider  again  the  two  population  case  described  In  the 
Introduction  and  suppose  that  observations  are  taken  in  pairs,  one 
from  each  population,  so  n^  =  n2  •  n,  say.  This  restriction  is 
stronger  than  necessary,  but  it  simplifies  the  subsequent  analysis. 

It  is  easy  to  modify  the  results  to  accommodate  the  case  in  which 
observations  are  taken  from  the  two  populations  in  an  arbitrary  fixed 
ratio.  It  seems  possible  to  achieve  a  slight  reduction  in  the  total 
expected  sample  size  by  choosing  the  sampling  rates  adaptively,  but 
the  fairly  small  improvement  seems  not  to  be  worth  the  considerable 
complication  in  analysis. 

The  obvious  analogue  of  the  stopping  rule  (2)  is  (cf.  (1)) 


(7) 


T  *  inf  {n :  n(- 


s.  f. 
In  In 


®2nf2n 


>  <7> 


The  main  results  of  this  section  are  Theorems  2  and  3,  which  corre¬ 
spond  respectively  to  Lemma  1  and  Theorem  1  in  the  single  population 
case.  Theorem  2  shows  that  T  defined  by  (7)  is  uniformly  asympto¬ 
tically  efficient  and  Theorem  3  shows  that  it  asymptotically  provides 
the  correct  coverage  probability  uniformly  in  p^,p2» 

Theorem  2.  Uniformly  in  0  <  p^,p2  <  1, 

ET  -  c{(p1q1)~1  +  (p2q2)-1}  (c  +  «)  . 

The  inequality  in  one  direction  is  a  consequence  of  the 
following  trivial  lemma. 
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Lenina  4.  For  all  0  <  p^,p2  <  1  and  aH  c 

ET  >  c^P^j)**1  +  (p2q2)_1)  . 

Proof .  From  (4),  Wald's  Identity  and  Jensen's  Inequality  one  obtains 
c"1  >  E(T(l/s1Tf1T  +  l/a2Tf2T)}  -  tE(s1Tf1T/T)}_1 

+  {E(s2Tf2T/T)}“1  -  (piqi  ET  -  E[ (s^T~  P^T)2/T]}-1 

+  {p2q2  ET-EKs^-p^)2/!]}"1  >  (ET)"1  ((p^)"1  +  (p^)’1)  . 

To  obtain  asymptotic  upper  bounds  on  E(T)  It  Is  useful  to 
define  (cf.  (2)) 

T^c)  -  inf(n  :  n/sinfin  5.  1/c)  • 

Since  8^n^in/n  increases  with  n,  for  all  o  >  1  and  8  >  1  with 
l/o  +  1/8  -  1, 

(8)  T  <  max^Cac),  T2(8c))  . 

In  what  follows  a  -  (p^  +  P2q2)/p2q2  and  P  "  ^plql  +  p2q2^plql*  80 

(9)  o/p1q1-6/p2q2-(p1q1  +  P2q2)/(p1q1P2q2)»  (p^j^)"1  +  (p^^"1  . 

With  these  fixed  values  of  a  and  8  there  is  no  ambiguity  in  writing 
Tj  for  Tj(oc)  and  T2  for  T2(Bc). 

It  is  now  possible  to  complete  the  proof  of  Theorem  2. 
Obviously  from  (8) 
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(10) 


_JL 


E(T)  <  E{max(T.,T,)}  -  /  T,dP  +  /  T,dP  . 

(T2<T1)  1 


Let  £  >  0  be  arbitrary.  Then 


(ID 


/  T_dP  <  /  T,dP 

(Tl— T2>  <T]1t2»  T2^6c(l4€)/p2q2} 


1  (P2<12)"1  Pcd-tejPfTjiTj}  +  (p2q2) 


+  /  .  JT2"(p2q2) 

{T2>(p2q2)_1  gc(l+€)> 


+  /  T-dP 

{T2>gc(l4€)/p2q2} 

gc  P(T2>(p2q2)_1  gc(l+e)) 


gcjdP  . 


By  Lemma  2 

(P2q2)  1  p^t2  >  (p2q2^_1  Bcd+e))  1  ^(p2q2^_1  e~2  ; 

and  by  the  Schwarz  inequality  and  Lemma  2  again 


I T2_(p2q2)  1  Mdpi [  (p2<i2>  2  Elp2<i2T2-M2  p(t2>  (p2q2)_1  Scd+e)}]*5 

VT2>(P2q2)"1  0cd+e)> 

<  4(p2q2)-1  G_1  * 

Putting  these  inequalities  together  with  (9),  (10),  and  (11)  yields 

ET  <  c{(p1q1)‘1  +  (p2q2)_1}(l  +  e  +  8/£2C)  , 

which  completes  the  proof,  as  £  is  arbitrarily  small. 

Theorem  3.  For  T  defined  by  (7),  uniformly  in  0  <  pj*p2  <  1 
lim  P{/c  [log(s1Tf2T/s2Tf1T)-  log(p1q2/p2q1)]  <x)  -  *(x)  . 
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With  the  help  of  Leona  5  below,  the  proof  of  Theorea  3  may  be 
carried  out  along  the  same  lines  as  the  proof  of  Theorem  1. 

Lemma  5.  Let  u  -  ((p^)"1  +  (p^)"1)"1-  For  all  e  >  0  and  all 
large  c  (not  depending  on  e) 

p{|yT-c|  >  ce}  <  8/ce2  . 

Proof.  The  proof  of  Theorem  2  shows  that 

p{t  >  c(i  +  e)v-1}  -  p(t1<t2,  t2  >  (p2q2)  1  Bc(l+e)J 

1  2 

+  p{t2<  Tx,  T1>(p1q1)  oc(l  +  e)}<4/ce 

The  same  upper  bound  for  P{T<  c(1-  e)v*  }  follows  by  a  similar 
calculation  and  the  observation  that  T  ^  min(T^(cic) ,  Tj (8c) ) . 
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4 .  Remarks 

(a)  Unpublished  numerical  computations  of  H.  Levene  in  the  one- 
sample  case  show  that  the  asymptotic  theory  of  Section  2  provides 
good  approximations  for  c  ^  10  and  reasonable  ones  for  c  as  small  as 
3.  It  seems  likely  that  similar  results  hold  for  two  populations. 

(b)  The  heuristic  principle  which  suggests  the  stopping  rules  (2) 
and  (7)  is  quite  common  in  the  literature  of  fixed  precision  estima¬ 
tion  (e.g.,  Ans combe,  (1953)),  and  it  leads  to  reasonable  stopping 
rules  for  more  complicated  log  linear  models.  However,  the  uniform 
asymptotic  theory  developed  here  seems  to  require  new  ideas  for  very 
simple  extensions. 

One  important  generalisation  is  a  set  of  2x2  tables  with 
equal  odds  ratios.  Appropriate  asymptotic  theory  might  involve  a 
large  number  of  observations  from  each  of  a  small  number  of  tables  or 
a  large  number  of  tables. 

Another  interesting  variation  is  log  linear  regression.  In 
this  case  one  might  also  wish  to  consider  sequential  design  in 
selecting  values  of  the  independent  variable. 
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